Closed Bug 1643689 Opened 5 years ago Closed 5 years ago

Enable manifest-scheduling on autoland

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: ahal, Assigned: ahal)

References

(Blocks 1 open bug, Regressed 1 open bug)

Details

Attachments

(9 files)

Bug 1643689 - [taskgraph] enable manifest-scheduling on autoland, r?marco 5 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1643689 - disable 1st round of manifest scheduling. r=aryx 5 years ago Joel Maher ( :jmaher ) (UTC -8) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1643689 - [taskgraph] Fix error in 'split_bugbug_args', r?marco 5 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1643689 - [taskgraph] enable manifest-scheduling on autoland, r?marco 5 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1643689 - Backed out changeset 10110918b6c0 5 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1643689 - [taskgraph] Fix taskgraph tests broken by f07222b728fa, r=me 5 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1643689 - [taskgraph] Allow tasks to override the configured 'test_manifest_loader', r?jmaher 5 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1643689 - [taskgraph] Replace 'CHUNK_SUITES_BLACKLIST' with the 'test_manifest_loader' key, r?jmaher 5 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1643689 - [ci] Ensure 'mochitest-a11y' doesn't run with manifest-scheduling enabled, r?jmaher 5 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review

Andrew Halberstadt [:ahal]

Assignee

Description

•

5 years ago

Now that the initial implementation of 'manifest-scheduling' has landed, this bug will track turning it on for autoland.

Solving backfills will be the major blocker here, though we'll also need to ensure we don't regress Push Health in a major way.

Marco Castelluccio [:marco]

Comment 1

•

5 years ago

To avoid regressions in sheriff's classifications quality, we should probably:

enforce one backout per push (so we avoid https://github.com/mozilla/mozci/issues/204). We might want to do bug 1636440 before enforcing;
organize a "training" session with sheriffs to explain the changes.

Andrew Halberstadt [:ahal]

Assignee

Updated

•

5 years ago

Summary: Enable the 'bugbug' manifest loader on autoland → Enable manifest-scheduling on autoland

Andrew Halberstadt [:ahal]

Assignee

Updated

•

5 years ago

Depends on: 1654591

Andrew Halberstadt [:ahal]

Assignee

Updated

•

5 years ago

Depends on: 1639873

Andrew Halberstadt [:ahal]

Assignee

Comment 2

•

5 years ago

Attached file Bug 1643689 - [taskgraph] enable manifest-scheduling on autoland, r?marco — Details

Sets autoland to use the 'bugbug' test manifest loader. This is being enabled
as part of a temporary trial to see the impact it has on sheriffing.

Phabricator Automation

Updated

•

5 years ago

Assignee: nobody → ahal

Status: NEW → ASSIGNED

Andrew Halberstadt [:ahal]

Assignee

Updated

•

5 years ago

Keywords: leave-open

Andrew Halberstadt [:ahal]

Assignee

Comment 3

•

5 years ago

We're planning to enable this tomorrow for a trial run to get a sense of:

A) Is everything working as it should (since this is hard to test on try).
B) How much of an impact does this have on sheriffing (and what we need to do to fix it).

We'll run the experiment until Thursday July 30th or it's obvious that it makes sheriffing too difficult (which may only take an hour or two until it gets to that point). If it turns out that sheriffs have no complaints and everything goes smoothly, there's a small chance that we'll leave it enabled past the end date.

Pulsebot

Comment 4

•

5 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9be5f086895c [taskgraph] enable manifest-scheduling on autoland, r=marco

Bogdan Tara[:bogdan_tara | bogdant]

Updated

•

5 years ago

Regressions: 1655807

Bogdan Tara[:bogdan_tara | bogdant]

Comment 5

•

5 years ago

Backed out changeset 9be5f086895c (bug 1643689) for busting gecko decision task and causig bug 1655807

Backout link: https://hg.mozilla.org/integration/autoland/rev/153accc0eb12651fa1b2d19ec1dc89c6cc6477d3

Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=311287166&repo=autoland

...
[task 2020-07-28T16:24:16.329Z] Generating tasks for release-update-verify-next firefox-next-win32
[task 2020-07-28T16:24:16.329Z] Generated 0 tasks for kind release-update-verify-next
[task 2020-07-28T16:24:16.369Z] Generating full task graph
[task 2020-07-28T16:24:16.448Z] Full task graph contains 24419 tasks and 105201 dependencies
[task 2020-07-28T16:24:21.768Z] PERFHERDER_DATA: {"suites": [{"lowerIsBetter": true, "subtests": [], "shouldAlert": false, "value": 20.07702398099991, "name": "bugbug_push_schedules_time"}, {"lowerIsBetter": true, "subtests": [], "shouldAlert": false, "value": 2, "name": "bugbug_push_schedules_retries"}], "framework": {"name": "build_metrics"}}
[task 2020-07-28T16:24:21.768Z] Traceback (most recent call last):
[task 2020-07-28T16:24:21.768Z]   File "/builds/worker/checkouts/gecko/taskcluster/mach_commands.py", line 205, in taskgraph_decision
[task 2020-07-28T16:24:21.768Z]     return taskgraph.decision.taskgraph_decision(options)
[task 2020-07-28T16:24:21.768Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/decision.py", line 251, in taskgraph_decision
[task 2020-07-28T16:24:21.768Z]     full_task_json = tgg.full_task_graph.to_json()
[task 2020-07-28T16:24:21.768Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/generator.py", line 163, in full_task_graph
[task 2020-07-28T16:24:21.768Z]     return self._run_until('full_task_graph')
[task 2020-07-28T16:24:21.768Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/generator.py", line 374, in _run_until
[task 2020-07-28T16:24:21.768Z]     k, v = next(self._run)
[task 2020-07-28T16:24:21.768Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/generator.py", line 304, in _run
[task 2020-07-28T16:24:21.768Z]     yield verifications('full_task_graph', full_task_graph, graph_config, parameters)
[task 2020-07-28T16:24:21.768Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/util/verify.py", line 58, in __call__
[task 2020-07-28T16:24:21.768Z]     parameters=parameters,
[task 2020-07-28T16:24:21.768Z]   File "/builds/worker/checkouts/gecko/taskcluster/taskgraph/util/verify.py", line 364, in verify_test_packaging
[task 2020-07-28T16:24:21.768Z]     raise Exception("\n".join(exceptions))
[task 2020-07-28T16:24:21.768Z] Exception: Build job build-linux64-tsan/opt has no tests, but specifies MOZ_AUTOMATION_PACKAGE_TESTS=1 in the environment. Unset MOZ_AUTOMATION_PACKAGE_TESTS in the task definition to fix.
[taskcluster 2020-07-28 16:24:23.303Z] === Task Finished ===
[taskcluster 2020-07-28 16:24:44.491Z] Unsuccessful task run with exit code: 1 completed in 198.821 seconds

Flags: needinfo?(ahal)

Andrew Halberstadt [:ahal]

Assignee

Comment 6

•

5 years ago

We decided to backout the trial. The issue happened because the algorithm decided no tests needed to run against that build and it tripped this check here:
https://searchfox.org/mozilla-central/rev/d9f92154813fbd4a528453c33886dc3a74f27abb/taskcluster/taskgraph/util/verify.py#358

I think we may need to disable this check if manifest-scheduling mode is enabled.

Flags: needinfo?(ahal)

Andrew Halberstadt [:ahal]

Assignee

Updated

•

5 years ago

Depends on: 1655978

Pulsebot

Comment 7

•

5 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9e7c9323a832 [taskgraph] enable manifest-scheduling on autoland, r=marco

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 8

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/9e7c9323a832

Joel Maher ( :jmaher ) (UTC -8)

Comment 9

•

5 years ago

Attached file Bug 1643689 - disable 1st round of manifest scheduling. r=aryx — Details

disable 1st round of manifest scheduling

Pulsebot

Comment 10

•

5 years ago

Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0bd8e8a498b1 disable 1st round of manifest scheduling. r=aryx

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 11

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0bd8e8a498b1

Andrew Halberstadt [:ahal]

Assignee

Comment 12

•

5 years ago

Attached file Bug 1643689 - [taskgraph] Fix error in 'split_bugbug_args', r?marco — Details

The dict needs to be passed to the last two substrategies, not just the last
one.

Andrew Halberstadt [:ahal]

Assignee

Comment 13

•

5 years ago

Attached file Bug 1643689 - [taskgraph] enable manifest-scheduling on autoland, r?marco — Details

Sets autoland to use the 'bugbug' test manifest loader. This is being enabled
as part of a temporary trial to see the impact it has on sheriffing.

Depends on D90159

Pulsebot

Comment 14

•

5 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/10110918b6c0 [taskgraph] Fix error in 'split_bugbug_args', r=marco https://hg.mozilla.org/integration/autoland/rev/0b196026ed59 [taskgraph] enable manifest-scheduling on autoland, r=marco

Alexandru Michis [:malexandru]

Comment 15

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/10110918b6c0
https://hg.mozilla.org/mozilla-central/rev/0b196026ed59

Pulsebot

Comment 16

•

5 years ago

Backout by malexandru@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/081af369ed79 Backed out changeset 0b196026ed59 for causing issues with manifest scheduling.

Alexandru Michis [:malexandru]

Comment 17

•

5 years ago

Backed out changeset 0b196026ed59 (Bug 1643689) for causing issues with manifest scheduling.

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception%2Crunnable&fromchange=ad0e25b984f421119fc948c1e72660c6df5f696d&tochange=f3b0c35583ebeb8e67dcb3c9b963773237d9c5a6

Here we can see failed backfill tasks:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=315918092&repo=autoland&lineNumber=50

Also failed "dt" jobs:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=315917273&repo=autoland&lineNumber=2041

Bogdan Tara[:bogdan_tara | bogdant]

Comment 18

•

5 years ago

The backout also seems to have fixed these a11y failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&searchStr=linux%2C18.04%2Cx64%2Cdebug%2Cmochitests%2Cwithout%2Ce10s%2Ctest-linux1804-64%2Fdebug-mochitest-a11y-1proc%2Ca11y&fromchange=75f7048e9d0af7f1ef5ff34fc6a807ecfb0fbd86&test_paths=accessible%2Ftests%2Fmochitest%2F&tochange=338bbaf179ae30128893b2acf18ba7e2b417034d&selectedTaskRun=UjVLUNLmRK-RWoRwPAysUQ.0

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

5 years ago

Regressions: 1665631

Andrew Halberstadt [:ahal]

Assignee

Comment 19

•

5 years ago

Attached file Bug 1643689 - Backed out changeset 10110918b6c0 — Details

This was causing |mach try auto| to stop selecting manifests.

Marco Castelluccio [:marco]

Updated

•

5 years ago

Regressions: 1665585

Pulsebot

Comment 20

•

5 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a974a223222b Backed out changeset 10110918b6c0

Pulsebot

Comment 21

•

5 years ago

Pushed by archaeopteryx@coole-files.de: https://hg.mozilla.org/mozilla-central/rev/084477976b2d Backed out changeset 10110918b6c0. a=Aryx

Alexandru Michis [:malexandru]

Comment 22

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a974a223222b

Pulsebot

Comment 23

•

5 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f07222b728fa [taskgraph] Fix error in 'split_bugbug_args', r=marco

Andrew Halberstadt [:ahal]

Assignee

Comment 24

•

5 years ago

Attached file Bug 1643689 - [taskgraph] Fix taskgraph tests broken by f07222b728fa, r=me — Details

Pulsebot

Comment 25

•

5 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/bb594cb9abe3 [taskgraph] Fix taskgraph tests broken by f07222b728fa,

Andreea Pavel [:apavel]

Comment 26

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/f07222b728fa
https://hg.mozilla.org/mozilla-central/rev/bb594cb9abe3

Andrew Halberstadt [:ahal]

Assignee

Comment 27

•

5 years ago

Attached file Bug 1643689 - [taskgraph] Allow tasks to override the configured 'test_manifest_loader', r?jmaher — Details

Andrew Halberstadt [:ahal]

Assignee

Comment 28

•

5 years ago

Attached file Bug 1643689 - [taskgraph] Replace 'CHUNK_SUITES_BLACKLIST' with the 'test_manifest_loader' key, r?jmaher — Details

Depends on D91587

Andrew Halberstadt [:ahal]

Assignee

Comment 29

•

5 years ago

Attached file Bug 1643689 - [ci] Ensure 'mochitest-a11y' doesn't run with manifest-scheduling enabled, r?jmaher — Details

When enabling manifest scheduling, several interdependencies between tests were
revealed resulting in too many new intermittents. Make sure we disable
manifest-scheduling there for now.

Depends on D91588

Pulsebot

Comment 30

•

5 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6c2a31b47d0b [taskgraph] Allow tasks to override the configured 'test_manifest_loader', r=jmaher https://hg.mozilla.org/integration/autoland/rev/50195a6883bf [taskgraph] Replace 'CHUNK_SUITES_BLACKLIST' with the 'test_manifest_loader' key, r=jmaher https://hg.mozilla.org/integration/autoland/rev/2912d91dd291 [ci] Ensure 'mochitest-a11y' doesn't run with manifest-scheduling enabled, r=jmaher

Dorel Luca [:dluca]

Comment 31

•

5 years ago

•

Edited

Backed out 3 changesets (bug 1643689) for Gecko Decision Task failure. CLOSED TREE

Log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=317051153&repo=autoland&lineNumber=1801

Push with failures:
https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception&revision=2912d91dd291de83873211fa4d017d6546551322

Backout:
https://hg.mozilla.org/integration/autoland/rev/0ca25be8f4f8d43e3673dcd545689c3e1663fab0

Flags: needinfo?(ahal)

Andrew Halberstadt [:ahal]

Assignee

Comment 32

•

5 years ago

This is very bizarre, I couldn't reproduce on try and I can't reproduce locally. Even when on the exact same base revision and using parameters.yml from autoland...

I also tried running it with an earlier Python version in case that was the issue, but still no luck.

Flags: needinfo?(ahal)

Andrew Halberstadt [:ahal]

Assignee

Comment 33

•

5 years ago

facepalm

It's because I had already fixed the issue locally, but I guess never ended up submitting the changes to phabricator.

Pulsebot

Comment 34

•

5 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/906c9cf29da7 [taskgraph] Allow tasks to override the configured 'test_manifest_loader', r=jmaher https://hg.mozilla.org/integration/autoland/rev/1b0858fe5cf2 [taskgraph] Replace 'CHUNK_SUITES_BLACKLIST' with the 'test_manifest_loader' key, r=jmaher https://hg.mozilla.org/integration/autoland/rev/0cceb980f44e [ci] Ensure 'mochitest-a11y' doesn't run with manifest-scheduling enabled, r=jmaher

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 35

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/906c9cf29da7
https://hg.mozilla.org/mozilla-central/rev/1b0858fe5cf2
https://hg.mozilla.org/mozilla-central/rev/0cceb980f44e

Pulsebot

Comment 36

•

5 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/23bb4efd11b9 [taskgraph] enable manifest-scheduling on autoland, r=marco

Cristian Brindusan [:cbrindusan]

Comment 37

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/23bb4efd11b9

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

5 years ago

Regressions: 1669432

Andrew Halberstadt [:ahal]

Assignee

Comment 39

•

5 years ago

I believe we are all done here. Regressions / follow-up work is all tracked in other bugs.

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

Keywords: leave-open

Resolution: --- → FIXED

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

5 years ago

Depends on: 1672967

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

5 years ago

Depends on: 1673050

You need to log in before you can comment on or make changes to this bug.